Distributed Deep learning
Resources
- https://d2l.ai/chapter_computational-performance/multiple-gpus.html
- https://jhui.github.io/2017/03/07/TensorFlow-GPU/
- https://www.logicalclocks.com/blog/goodbye-horovod-hello-collectiveallreduce
- Twelve ways to fool the masses when reporting performance of deep learning workloads
- Distributed Deep Learning 101: Introduction
Talks
- #TALK ALCF Datascience frameworks: Tensorflow, PyTorch, Keras, and Horovod
- #TALK Scaling Deep Learning for Scientific Workloads on the #1 Summit Supercomputer
- #TALK Scaling Neural Networks Training - Thorsten Kurth
Code
See AI/Data Engineering/Tensorflow#Distributed training
- #CODE Analytics Zoo
- Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
- https://analytics-zoo.readthedocs.io/en/latest/index.html
- #CODE Horovod
- #CODE Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training
References
-
#PAPER Evaluation of Deep Learning Frameworks over Different HPC Architectures (Shams 2017)
-
#PAPER Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data (Kurth 2017)
-
#PAPER Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis (Tal Ben-Nun and Torsten Hoefler 2018)
-
#PAPER Mesh-TensorFlow: Deep Learning for Supercomputers (Shazeer 2018)
- #TALK https://www.youtube.com/watch?v=HgGyWS40g-g
- #CODE Mesh-TensorFlow
- Go beyond data-parallel training
- More sophisticated parallel computations (big models that do not fit on one device)
-
#PAPER GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism (Huang 2019)
-
#PAPER A Quantitative Study of Deep Learning Training on Heterogeneous Supercomputers (Han 2019)
-
#PAPER Channel and filter parallelism for large-scale CNN training (Dryden 2019)
-
#PAPER Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism (Dryden 2019)
-
#PAPER Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training (Li 2019)
-
#PAPER Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools (Mayer 2019)
-
#PAPER Performance Analysis of Deep Learning Workloads on Leading-edge Systems (Ren 2019)
-
#PAPER TensorFlow on State-of-the-Art HPC Clusters: A Machine Learning use Case (Ramirez-Gargallo 2019)
- https://core.ac.uk/download/pdf/196280993.pdf
- Compared MN4, Power9 and Dibona HPC clusters. Only CPUs compared (Power9 GPUs are not evaluated)
-
#PAPER Exascale Deep Learning for Scientific Inverse Problems (Laanait 2019)
-
#PAPER ZeRO: memory optimizations toward training trillion parameter models (Rajbhandari 2019)
- #CODE DeepSpeed
- DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective. For pytorch
- www.deepspeed.ai/
- https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/
- #CODE DeepSpeed
-
#PAPER Towards a Scalable and Distributed Infrastructure for Deep Learning Applications (Hasheminezhad 2020)
- Phylanx Deep Learning Framework
- Good comparison with respect to SOTA
- Phylanx provides a high-productivity debugable Python-based interactive interface, JetLag
- Tests only on CPU. Does it support GPUs?
-
#PAPER Distributed Training of Deep Learning Models: A Taxonomic Perspective (Langer 2020)
-
#PAPER Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training (Bian 2021)
-
#PAPER Pathways: Asynchronous Distributed Dataflow for ML (Barham 2022)
-
#PAPER Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling? (Tay 2022)